auction problem
Reviews: Learning Mean-Field Games
This paper considers learning in mean-field games (MFG). MFGs take the limit of an infinite number of agents, which are considered indistinguishable. Based on a motivating example consisting of a repeated Ad auction problem, the authors introduce a "general" mean-field game (GMFG), a model-free version of the standard MFG. The authors revisit standard Q-Learning and a soft version of it, and provide convergence and complexity results of such an algorithm. These methods are compared numerically on the auction problem together with a recently proposed approach and show better performance.
Adaptive Oracle-Efficient Online Learning
Wang, Guanghui, Hu, Zihao, Muthukumar, Vidya, Abernethy, Jacob
The classical algorithms for online learning and decision-making have the benefit of achieving the optimal performance guarantees, but suffer from computational complexity limitations when implemented at scale. More recent sophisticated techniques, which we refer to as oracle-efficient methods, address this problem by dispatching to an offline optimization oracle that can search through an exponentially-large (or even infinite) space of decisions and select that which performed the best on any dataset. But despite the benefits of computational feasibility, oracle-efficient algorithms exhibit one major limitation: while performing well in worst-case settings, they do not adapt well to friendly environments. In this paper we consider two such friendly scenarios, (a) "small-loss" problems and (b) IID data. We provide a new framework for designing follow-the-perturbed-leader algorithms that are oracle-efficient and adapt well to the small-loss environment, under a particular condition which we call approximability (which is spiritually related to sufficient conditions provided by Dud\'{i}k et al., [2020]). We identify a series of real-world settings, including online auctions and transductive online classification, for which approximability holds. We also extend the algorithm to an IID data setting and establish a "best-of-both-worlds" bound in the oracle-efficient setting.
Online Auctions and Multi-scale Online Learning
Bubeck, Sébastien, Devanur, Nikhil R., Huang, Zhiyi, Niazadeh, Rad
We consider revenue maximization in online auctions and pricing. A seller sells an identical item in each period to a new buyer, or a new set of buyers. For the online posted pricing problem, we show regret bounds that scale with the best fixed price, rather than the range of the values. We also show regret bounds that are almost scale free, and match the offline sample complexity, when comparing to a benchmark that requires a lower bound on the market share. These results are obtained by generalizing the classical learning from experts and multi-armed bandit problems to their multi-scale versions. In this version, the reward of each action is in a different range, and the regret w.r.t. a given action scales with its own range, rather than the maximum range.